ASoC: SOF: ipc4-mtrace: resync host_read_ptr on debugfs open#5690
ASoC: SOF: ipc4-mtrace: resync host_read_ptr on debugfs open#5690majunkier wants to merge 2 commits intothesofproject:topic/sof-devfrom
Conversation
sound/soc/sof/ipc4-mtrace.c
Outdated
| &write_ptr, sizeof(write_ptr)); | ||
| write_ptr -= write_ptr % 4; | ||
| core_data->dsp_write_ptr = write_ptr; | ||
| core_data->host_read_ptr = write_ptr; |
There was a problem hiding this comment.
but you will loose debug data between two opens?
open / read / close
fw still prints # this information might be lost?
open / read /close
There was a problem hiding this comment.
Ack @ujfalusi , I asked @majunkier to push this for review. THis is genuine problem, but not sure which way to go about. The current implementation is intentional in the sense that you cannot run mtrace-reader.py before the driver is loaded, and as the driver loads and boots the DSP, it is not possible to get first boot logs if the read pointer is reset at boot. IOW, ability to get old logs is a feature. But as Mateusz has rootcaused, this creates non-deterministic behaviour is the mtrace-reader is restarted, and this is showing up as problems in some scenarios we have in CI. It's also true mtrace-reader.py needs to be stopped before the driver can be unloaded.
Not sure how to best solve this. I don't think we sync at open. Sync at close is a possibility, but this is still not deterministic. I guess one option is to just document this behaviour and the test case (that checks whether logging is working) needs to create some DSP activity that ensures there is new logs after mtrace-reader is restarted. Ideas are welcome!
There was a problem hiding this comment.
I'm not sure, but have we tried to read the pointers directly from the slot?
I guess the issue shows up when the DSP is suspended and the the write counter got reset (also the read ptr in slot).
If we read the write/read ptr from slot on open then we will have up-to date info, or we just always read them on every time in sof_ipc4_mtrace_dfs_read(), first I would try the sync at open only. There is a reason why we only update the write ptr when fw tells that it has moved.
There was a problem hiding this comment.
so i will remove core_data->host_read_ptr = write_ptr and re-sync only dsp_write_ptr from SRAM in open().
I hope this avoid losing inter session data.
i need to validate if this will be enough
UPDATE
i have tested current changes, seems working
sound/soc/sof/ipc4-mtrace.c
Outdated
| * dsp_write_ptr (not 0) avoids re-reading stale data from | ||
| * wrapped circular buffer. | ||
| */ | ||
| core_data->host_read_ptr = core_data->dsp_write_ptr; |
There was a problem hiding this comment.
either of the two change is redundant? If you reset the pointers in open then this is not needed, but resetting might be a problem as well.
There was a problem hiding this comment.
Dropping this change as well.
4c09f20 to
98832db
Compare
| sof_mailbox_read(sdev, core_data->slot_offset + sizeof(u32), | ||
| &write_ptr, sizeof(write_ptr)); | ||
| write_ptr -= write_ptr % 4; | ||
| core_data->dsp_write_ptr = write_ptr; |
There was a problem hiding this comment.
The commit message also needs update.
But something is not quite right in the description. The dsp_write_ptr is updated for the NOTIFY_LOG_BUFFER_STATUS in sof_ipc4_mtrace_update_pos(), it is done regardless if reader is active or not, but if there is one active then it is woken up to read out the data.
There was a problem hiding this comment.
out of curiosity, can you debug this a bit further?
read the write and read ptr from the slot and then compare it with the ptr value tracked in core_data and if they differ print them out.
Something is not adding up here.
There was a problem hiding this comment.
ok, i added debug (added as a part of this PR temporary), which reveal other issue:
output catch
core0 ptr mismatch: slot r/w 0xffffffff/0xfffffffc vs cached r/w 0x0/0xfffffffc
slot_read_ptr = 0xffffffff uninitialized SRAM
slot_write_ptr = 0xfffffffc synchronized
cached host_read_ptr = 0x0 kernel memory
In sof_ipc4_mtrace_dfs_open() kernel initializes core_data->host_read_ptr = 0 in local memory but never write value to SRAM, first SRAM write is in sof_ipc4_mtrace_dfs_read() after consuming data.
so problem looks like desync kernel and dsp.
kernel core_data->host_read_ptr = 0
SRAM slot[offset+0] = 0xffffffff
DSP reads from SRAM slot_read_ptr = 0xffffffff
DSP calculates free_space = (0xffffffff - 0xfffffffc - 1) = 2 bytes
DSP thinks buffer is full so posibility of dropped logs
possible solution : Write cached 0x0 value to SRAM in sof_ipc4_mtrace_dfs_open()
@ujfalusi could you take a look if i understand it correctly ?
There was a problem hiding this comment.
looks like the fw is not initializing the slot at init, so both host_ptr and dsp_ptr is 0xffffffff and it is using these in mtrace_out [1]
As I recall when the mtrace feature was added, the fw did a reset of the ptrs on init.
There was a problem hiding this comment.
@majunkier, @kv2019i, this diff in zephyr makes for a huge difference with unpatched kernel mtrace code!
diff --git a/subsys/logging/backends/log_backend_adsp_mtrace.c b/subsys/logging/backends/log_backend_adsp_mtrace.c
index bbceefa4a29d..11c1a8542bd7 100644
--- a/subsys/logging/backends/log_backend_adsp_mtrace.c
+++ b/subsys/logging/backends/log_backend_adsp_mtrace.c
@@ -84,7 +84,13 @@ static void mtrace_init(void)
}
ADSP_DW->descs[ADSP_DW_SLOT_NUM_MTRACE].type = MTRACE_LOGGING_SLOT_TYPE(MTRACE_CORE);
+
+ struct adsp_debug_slot *slot = (struct adsp_debug_slot *)
+ ADSP_DW->slots[ADSP_DW_SLOT_NUM_MTRACE];
#endif
+
+ slot->host_ptr = 0;
+ slot->dsp_ptr = 0;
}
static size_t mtrace_out(int8_t *str, size_t len, size_t *space_left)There was a problem hiding this comment.
like this on fw boot on TGL with mtrace-reader open:
[ 0.000000] <inf> dma_intel_adsp_gpdma: intel_adsp_gpdma_power_on: dma@7d000: initialized
[ 0.000000] <inf> dma_intel_adsp_gpdma: intel_adsp_gpdma_power_on: dma@7c000: initialized
[ 0.000000] <inf> dai_intel_ssp: dai_ssp_pm_runtime_en_ssp_power: SSP0
[ 0.000000] <inf> dai_intel_ssp: dai_ssp_pm_runtime_en_ssp_power: SSP1
[ 0.000000] <inf> dai_intel_ssp: dai_ssp_pm_runtime_en_ssp_power: SSP2
[ 0.000000] <inf> dai_intel_ssp: dai_ssp_pm_runtime_en_ssp_power: SSP3
[ 0.000000] <inf> dai_intel_ssp: dai_ssp_pm_runtime_en_ssp_power: SSP4
[ 0.000000] <inf> dai_intel_ssp: dai_ssp_pm_runtime_en_ssp_power: SSP5
[ 0.000000] <inf> dai_intel_dmic: dai_dmic_probe: dmic_probe()
[ 0.000000] <inf> dai_intel_dmic: dai_dmic_probe: dmic_probe()
[ 0.000000] <inf> init: print_version_banner: FW ABI 0x301d001 DBG ABI 0x5003000 tags SOF:v2.14-pre-rc-517-gd088eb0747a2 zephyr:v4.3.0-6597-gd885cfebaa34 src hash 0x3e97bd3e (ref hash 0x3e97bd3e)
*** Booting Zephyr OS build v4.3.0-6597-gd885cfebaa34 ***
[ 0.000000] <inf> main: sof_app_main: SOF on intel_adsp
[ 0.000000] <inf> main: sof_app_main: SOF initialized
and when mtrace-reader started way after the DSP boot:
[ 0.000000] <inf> dma_intel_adsp_gpdma: intel_adsp_gpdma_power_on: dma@7d000: initialized
[ 0.000000] <inf> dma_intel_adsp_gpdma: intel_adsp_gpdma_power_on: dma@7c000: initialized
[ 0.000000] <inf> dai_intel_ssp: dai_ssp_pm_runtime_en_ssp_power: SSP0
[ 0.000000] <inf> dai_intel_ssp: dai_ssp_pm_runtime_en_ssp_power: SSP1
[ 0.000000] <inf> dai_intel_ssp: dai_ssp_pm_runtime_en_ssp_power: SSP2
[ 0.000000] <inf> dai_intel_ssp: dai_ssp_pm_runtime_en_ssp_power: SSP3
[ 0.000000] <inf> dai_intel_ssp: dai_ssp_pm_runtime_en_ssp_power: SSP4
[ 0.000000] <inf> dai_intel_ssp: dai_ssp_pm_runtime_en_ssp_power: SSP5
[ 0.000000] <inf> dai_intel_dmic: dai_dmic_probe: dmic_probe()
[ 0.000000] <inf> dai_intel_dmic: dai_dmic_probe: dmic_probe()
[ 0.000000] <inf> init: print_version_banner: FW ABI 0x301d001 DBG ABI 0x5003000 tags SOF:v2.14-pre-rc-517-gd088eb0747a2 zephyr:v4.3.0-6597-gd885cfebaa34 src hash 0x3e97bd3e (ref hash 0x3e97bd3e)
*** Booting Zephyr OS build v4.3.0-6597-gd885cfebaa34 ***
[ 0.000000] <inf> main: sof_app_main: SOF on intel_adsp
[ 0.000000] <inf> main: sof_app_main: SOF initialized
I think the fix is in zephyr...
There was a problem hiding this comment.
core_data->dsp_write_ptr is updated on each aging timer IPC, but there is 256ms window between DSP write and IPC. If reader opens in this window and DSP goes idle before next IPC,there wont be further writes and sof_wait_mtrace_avail() blocks until timeout. Fix by reading dsp_write_ptr directly from SRAM in open() to get current write position without waiting for the next IPC. host_read_ptr is left unchanged to preserve inter-session data. Signed-off-by: Mateusz Junkier <mateusz.junkier@intel.com>
98832db to
4b9d08c
Compare
Signed-off-by: Mateusz Junkier <mateusz.junkier@intel.com>
core_data->dsp_write_ptr is updated on each aging timer IPC, but there
is 256ms window between DSP write and IPC. If reader opens in this window
and DSP goes idle before next IPC,there wont be further writes
and sof_wait_mtrace_avail() blocks until timeout.
Fix by reading dsp_write_ptr directly from SRAM in open() to get
current write position without waiting for the next IPC. host_read_ptr
is left unchanged to preserve inter-session data.